Members
Overall Objectives
Research Program
Application Domains
New Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Section: New Results

Resource Allocation and Scheduling in Large Scale Distributed Platforms.

We have considered several problems arising in the context of large scale platforms, that are characterized by their heterogeneity, the difficulty of predicting performance and the risk failures. In [13] , we concentrate on heterogeneity issues in collective communication schemes where the goal is to broadcast a message to a set of nodes. In particular, we consider a realistic model in the context of large scale distributed platforms where some nodes may lie behind NATs or firewalls and may be therefore unable to forward the message between them. In [21] , [20] , we consider resource allocation problems that arise in large scale data centers. In [20] , we analyze the main characteristics of the services in a huge trace corresponding to an actual data center and that has been released recently by google. In the same context, in [21] , we concentrate on issues related to fault tolerance by over subscribing services in order to guarantee quality of service in a failure prone environment. At last, the difficulty to predict the actual performance of resources made it very popular to rely on dynamic scheduling algorithms where scheduling decisions are made at runtime. In [22] , we analyze the performance of such a dynamic scheduling algorithm in terms on number of induced communications for outer product and matrix multiplication kernels.